Skip to content

Conversation

@itsdebartha
Copy link
Contributor

This PR addresses #868 . Adds the method for a weighted mean of elements transformed by a function.

  • Added mean(f, itr, weights)
  • Added tests for the method

Adds the method for a weighted mean of elements transformed by a function.

- Added `mean(f, itr, weights)`
- Added tests for the method
Added tests for UnitWeights
src/weights.jl Outdated
mean(f, A::AbstractArray, w::AbstractWeights; dims::Union{Colon,Int}=:) =
_mean(f.(A), w, dims)

function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slightly more generic version of the same thing, which is less likely to require maintenance in the future (if we add additional keyword arguments to mean). I suggest using this pattern when you can.

Suggested change
function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)
function mean(f, A::AbstractArray, w::UnitWeights; kwargs...)

src/weights.jl Outdated
function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)
a = (dims === :) ? length(A) : size(A, dims)
a != length(w) && throw(DimensionMismatch("Inconsistent array dimension."))
return mean(f.(A), dims=dims)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid memory allocation by using mean(f, A) instead of mean(f.(A)). Remember that f.(A) creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A)) is 2 separate operations: The first one creates a new array, f.(A), and the second calculates its mean. mean(f, A) calculates the mean of (f(x) for x in A) directly, as one operation, without creating a new array.

Suggested change
return mean(f.(A), dims=dims)
return mean(f, A; dims)

I'd also suggest making it slightly more generic, as

Suggested change
return mean(f.(A), dims=dims)
return mean(f, A; kwargs...)

src/weights.jl Outdated
end

"""
mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int])
mean(f, A::AbstractArray, w::AbstractWeights[; dims])

dims shouldn't be required to be an integer.

src/weights.jl Outdated
```
"""
mean(f, A::AbstractArray, w::AbstractWeights; dims::Union{Colon,Int}=:) =
_mean(f.(A), w, dims)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid memory allocation by using mean(f, A) instead of mean(f.(A)). Remember that f.(A) creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A)) is 2 separate operations: The first one creates a new array, f.(A), and the second calculates its mean. mean(f, A) calculates the mean of (f(x) for x in A) directly, as one operation, without creating a new array.

Suggested change
_mean(f.(A), w, dims)
_mean(f, A; dims)

I'd also suggest making it slightly more generic, as

Suggested change
_mean(f.(A), w, dims)
_mean(f, A; kwargs...)

(See below for more details.)

- Add keyword arguments for the weights
- Modified functions to use `Iterators.map`
- Add more tests
src/weights.jl Outdated
```
"""
mean(f, A, w::AbstractWeights; kwargs...) =
mean(broadcast(f, A), w; kwargs...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said, check the method for weighted sem to see how to use Broadcasting.broadcasted correctly. broadcast is not the same thing as Broadcasting.broadcasted. The latter is lazy.

All checks not passed
- Removed implementation for multi-dimensional array
- Updated documentations
- Updated tests
src/weights.jl Outdated
Comment on lines 704 to 706
return sum(Broadcast.broadcasted(f, A, w) do f, a_i, wg
return f(a_i) * wg
end) / sum(w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use Broadcast.instantiate. Moreover, to me it seems this should be changed to

Suggested change
return sum(Broadcast.broadcasted(f, A, w) do f, a_i, wg
return f(a_i) * wg
end) / sum(w)
return sum(Broadcast.instantiate(Broadcast.broadcasted(A, w) do a_i, wg
return f(a_i) * wg
end)) / sum(w)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Broadcast.instantiate causes extra allocations, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried. But without Broadcast.instantiate it will be slow and fall back to Cartesian indexing. See, e.g., JuliaLang/julia#31020 ("we require Broadcast.instantiate for fast reduce").

Used `Broadcast.instantiate` as requested to overcome falling back to Cartesian indexing
src/weights.jl Outdated
```
"""
mean(f, A::AbstractArray, w::AbstractWeights) =
_funcweightedmean(f, A, w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, was there any particular reason for introducing _funcweightedmean instead of implementing two methods for mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there wasn't any such reason. I am removing this function, and implementing it as a method for mean

Instead deploy it as a method for `mean`
@itsdebartha
Copy link
Contributor Author

@devmotion @ParadaCarleton gentle ping, is this PR ready to merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants